The point of this document was to inspect the country variable. When asked where they were from, people had an open space to type in the country. Many people seemed to have skipped this question, so we wanted to see if there was missing data, or simply, if the country they entered was not translated or converted into the right code.
The data used for this is from global survey, wave 9.
The plot is only showing countries with at least 500 participants across all waves. By hover over the barplot you can inspect the exact N of participants by county.
We observe that 12,000+ participants did not report their country.
I wanted to inspect all variables with language and country information to try to understand the nature of missigness.
global_raw %>%
select(startlanguage, contains("country")) %>%
slice(1:7, 30:40) %>%
kable() %>%
kable_styling("hover")
| startlanguage | country | country_trl | country_iso | country_trans |
|---|---|---|---|---|
| fr | Canada | Canada | Canada | |
| fr | Canada | Canada | Canada | |
| en | Canada | Canada | Canada | |
| es | Colombia | Colombia | Colombia | |
| fr | Canada | Canada | Canada | |
| en | ||||
| en | Canada | Canada | Canada | |
| en | Canada | Canada | Canada | |
| pt-BR | Brasil | Brazil | Brazil | |
| fr | Canada | Canada | Canada | |
| zh-Hant-TW | 台灣 | Taiwan | Taiwan Province of China | |
| en | Canada | Canada | Canada | |
| tr | Ankara Kurkiye | Ankara | Ankara | |
| en | ||||
| en | ||||
| fr | France | France | France | |
| fr | Canada | Canada | Canada | |
| he | ישראל | Israel | Israel |
Among those who did not report their country, I wanted to inspect the start language. French and English seem to account for 60% which might suggest most responses have been collected in Canada. Not surprising, given the recruitment strategy.
| startlanguage | n | prop |
|---|---|---|
| 1 | 0% | |
| ar | 155 | 1% |
| da | 15 | 0% |
| de | 176 | 1% |
| el | 28 | 0% |
| en | 3625 | 30% |
| es | 989 | 8% |
| fa | 50 | 0% |
| fr | 3509 | 29% |
| he | 324 | 3% |
| hi | 13 | 0% |
| hr | 8 | 0% |
| id | 54 | 0% |
| it | 984 | 8% |
| ja | 101 | 1% |
| ko | 32 | 0% |
| lt | 19 | 0% |
| mr | 11 | 0% |
| ms | 246 | 2% |
| nl | 32 | 0% |
| pt | 34 | 0% |
| pt-BR | 700 | 6% |
| ro | 28 | 0% |
| ru | 193 | 2% |
| sk | 75 | 1% |
| sq | 39 | 0% |
| sr-Latn | 78 | 1% |
| sv | 18 | 0% |
| swh | 13 | 0% |
| tl | 33 | 0% |
| tr | 183 | 1% |
| uk | 2 | 0% |
| vi | 7 | 0% |
| zh-Hans | 203 | 2% |
| zh-Hant-TW | 304 | 2% |
| wave | n | prop |
|---|---|---|
| 1 | 8080 | 66% |
| 2 | 1455 | 12% |
| 3 | 1167 | 10% |
| 4 | 420 | 3% |
| 5 | 444 | 4% |
| 6 | 268 | 2% |
| 7 | 154 | 1% |
| 8 | 204 | 2% |
| 9 | 90 | 1% |
## # A tibble: 0 × 413
## # … with 413 variables: rowid <int>, id <chr>, submitdate <dttm>,
## # lastpage <chr>, startlanguage <chr>, seed <chr>, startdate <dttm>,
## # datestamp <dttm>, refurl <chr>, lang <chr>, status <dbl>, country <chr>,
## # sex <dbl>, age <chr>, edu <dbl>, cempstat_sq001 <dbl>,
## # cempstat_sq002 <dbl>, cempstat_sq003 <dbl>, cempstat_sq004 <dbl>,
## # cempstat_sq005 <dbl>, cempstat_sq006 <dbl>, cempstat_sq007 <dbl>,
## # cempstat_sq008 <dbl>, emplstat_sq001 <dbl>, emplstat_sq002 <dbl>, …
This document was prepared by UK - reach me with any questions/comments!